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ONE-DIMENSIONAL STEPPING STONE MODELS, SARDINE 
GENETICS AND BROWNIAN LOCAL TIME 

By Richard Durrett and Mateo Restrepo 

Cornell University 

Consider a one-dimensional stepping stone model with colonies of 
size M and per-generation migration probability v, or a voter model 
on Z in which interactions occur over a distance of order K. Sample 
one individual at the origin and one at L. We show that if Mv/L and 
L/K^ converge to positive finite limits, then the genealogy of the sam- 
ple converges to a pair of Brownian motions that coalesce after the 
local time of their difference exceeds an independent exponentially 
distributed random variable. The computation of the distribution of 
the coalescence time leads to a one-dimensional parabolic differential 
equation with an interesting boundary condition at 0. 

1. Introduction. Cox and Durrett [6] and Zahle, Cox and Durrett [15] 
have recently studied the two-dimensional stepping stone model. Space is 
represented as a torus A(L) = (ZmodL)^. To avoid a factor of 2 and to 
make the dynamics easier to describe, we suppose that at each point x G 
A(L) there is a colony of M haploid individuals labeled 1,2,...,M. Each 
individual in the system is replaced at rate 1. With probability 1 — it is 
replaced by a copy of an individual chosen from the same colony. If the 
individual is in colony x, then with probability u it is replaced by a copy 
of one chosen from nearby colony y ^ x with probability q{y — x) where the 
difference is computed componentwise modulo L, and the representative of 
the equivalence class chosen from (— L/2,L/2]^. Here q{z) is an irreducible 
probability on W? with g(0, 0) =0, finite range and the same symmetry as 1?: 
q{xi,X2) = q{—xi, —X2) and q{xi,X2) = q{x2,xi). These assumptions imply 
that jumps according to q have mean and covariance a^I. 

When M = 1 the stepping stone model reduces to the voter model, but 
being able to consider colony size M > 1 enriches the behavior of the model. 
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As in the voter model, we can define a genealogical process for each indi- 
vidual that traces the source of its genetic material backward in time. For 
one individual this is a random walk that moves to a randomly chosen in- 
dividual in the same colony with probability 1 — u and otherwise jumps to 
a new colony chosen according to q. The genealogies of two individuals are 
random walks that coalesce with probability 1/M on each jump when they 
land in the same colony. We will call q the dispersal distribution since it 
is the jump distribution for the genealogical process. If the migration rate 
times the colony size, Miy, is large enough, then the population behaves as 
a homogeneously mixing unit. Let to be the coalescing time of two lineages 
and let vr denote that the two individuals are chosen at random from the 
population. Cox and Durrett [6] have shown 

Theorem 1. IfL^oo and (27r(j^) Mz>/ log L ^ a e (0, oo], then 

P^(2to>^^MLh^^e~'. 

In genetics terms, the system behaves as a homogeneously mixing pop- 
ulation of "effective" size ML^{l + a)/a. As a — > cxD this converges to the 
actual population size, indicating that the critical size of Mv for interesting 
behavior is O(logL). One finds more interesting behavior when individuals 
are sampled from a, x square of colonies, but those results are not 
relevant here, so we refer the reader to Cox and Durrett [6] and Zahle, Cox 
and Durrett [15] for details. 

Here, we will be interested in investigating similar questions for the one- 
dimensional stepping stone model. Although we live in a two-dimensional 
world, this case is relevant for applications. Many species, such lions 
and abalone, live along a coastline that is essentially one-dimensional. For 
example, Bowen and Grant [5] have studied sardines at five different sites 
in the Indian and Pacific oceans. Wilkins and Wakeley's [14] analysis of this 
data using the one-dimensional stepping stone model was the inspiration for 
this study. 

Although the most natural setting to pursue our results would be a one- 
dimensional interval or a ring of colonies, we will, for technical reasons, study 
the stepping stone model on Z. The setup is the same as that of Cox and 
Durrett [6] described above. There are M haploid individuals per colony 
and nearest-neighbor migration occurs with probability u. We sample one 
individual from the colony at 0, and another from the colony at L. If M = 1, 
then the two lineages will coalesce the first time they enter the same colony. 
Our first question is how large should Mi' need to be for the system to 
have more interesting behavior? Since migration occurs with probability 
it takes time 0(L^/z^) for the difference in the locations of the two lineages to 
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change by 0{L). In this time the difference wih visit a given value between 
and L an average oi L/v times, so if we want the probabiUty of coalescence 
to be positive but not certain, this should be 0{M). 

Theorem 2. Consider a one- dimensional stepping stone model with M 
haploid individuals per colony and nearest-neighbor migration with proba- 
bility u. Sample one individual from the colony at 0, and another from the 
colony at L. If L ^ oo and Mv/L — > a G (0, oo), then 2tQ/{L?' /v) converges 
in distribution to lQ^{a^), where ^t(O) is the local time at for a standard 
Brownian motion starting at 1, and ^ is independent with a mean 1 expo- 
nential distribution. 

Note that as a ^ the limit becomes the hitting time of and that as 
oo the limit — > oo . 

We are, of course, not the first to have considered this problem. Writing 
things in our notation, Maruyama [12] considered a ring of L colonies with 
M diploid individuals per colony. He did not formulate his result as a limit 
theorem, but by filling in a few details in the Appendix, we can use his 
computations to show that if Mv/L — > a, 

(1) £;o(exp(-Ato/(LV^^))) ^ (1 + 4aVA)"\ 

It would be interesting to derive this formula using a generalization of The- 
orem 2 to the circle, and computations for the local time at of a Brownian 
motion on the circle. 

Wilkins and Wakeley [14] modeled space as {0, 1/L, 2/L, . . . , 1} with one 
individual per site, and used a dispersal distribution that is a normal dis- 
tribution with a small variance with reflecting boundary conditions on 
the ends. They analyzed the system by simulation and numerical solution 
of differential equations for various combinations of L and cj^. Here we will 
consider the corresponding problem on Z, sample one individual from and 
one from L, and suppose dispersal distance is of order K. If the dispersal 
is nearest neighbor, the two lineages cannot cross each other without co- 
alescing. To see how large K has to be for the system to have interesting 
behavior, we note that it takes roughly I? jK'^ jumps to move distance L, 
and at this point the difference between the two locations will have visited 
a typical value between and L about L/K"^ times. If we take K = c\/Z, 
then the expected number of visits to converges to a positive finite limit, 
and the probability of coalescence is positive but not certain. 

To state the result and to write its proof, it is convenient to introduce an- 
other parameter N and let K = A^^/^ and L = 0{N). We make the following 
assumptions about the dispersal distribution : 

1. symmetry: {z) =q^{ — z), 
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2. the variance X^zez z^q^{z) = with aj\f ^ a e (0, oo), 

3. there is an h> 0, independent of N, so that q'^ {z) > hl\fN for \z\ < 

4. exponential tails: {z) < Cexp(— c|z|/\/iV)- 

These assumptions contain uniform, bilateral exponential and normal distri- 
butions as special cases. The last condition is strong but is convenient since 
it allows us to choose B so that 

E q''{z)<N--\ 
\z\>BVN\ogN 

Since the limit theorem involves times of order A^, we can suppose without 
loss of generality that 

5. q'^{z) = for |z| > B^^logN, 

since the probability of having a jump larger than By/N log N by time N is 
< 1/N. The constant B is special and the letter B is reserved for its value. 
Here and in what follows, c and C are positive finite constants whose values 
are unimportant and will change from line to line, while 0{f{N)) indicates 
a quantity that can be bounded by Cf{N), with C independent of A^. 

Theorem 3. Consider a sequence of voter models on Z with jumps at 
rate 1, and dispersal distributions , satisfying assumptions 1-5. If the 
positive numbers Lat have L^^ /{aN) — > xq ^ 0, then 2tQ/N converges in dis- 
tribution to £q ^(<T^/2), where io is the local time at of a standard Brownian 
motion started from xq and is independent with a mean 1 exponential dis- 
tribution. 

Again, as a ^ the limit becomes the hitting time of 0, and as o" ^ oo the 
limit — > oo. 



To get a more explicit description of the distribution of the limits in 
Theorems 2 and 3 we would like to compute 

Pxi^oH^/^) >t)= PxiXUt) < e) = exp(-A4(t)). 

Formula 1.3.7 in Borodin and Salaminen's [4] Handbook of Brownian Motion 
tells us that 

(2) =^e-^'--^"/^'dz 
V27ri 

- ^exp((|z| + |x|)A + Xh/2) Erfc(^ + dz, 
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where Erfc is the error function, that is, the upper tail of the normal distri- 
bution. 

Another approach to computing u{t,x) = Exexp{—X£Q{t)) is to note that 
for X ^0 it satisfies the heat equation 

du _1 d'^u 

To determine the boundary condition at 0, we run Brownian motion until 
Tfi = inf{t : Bf ^ {—h,h)} and use symmetry u{t,x) = u{t,—x) to conclude 
that 

u{t, 0) = Eo{e-^^<^^^'^K{t - Th, h);Th <t)+ Po(t/, > t). 

The strong Markov property implies that io{Th) is exponentially distributed. 
Let D£[Th) be the number of downcrossings of (0,e) by reflecting Brownian 
motion before it hits h. D^(Th) is geometrically distributed with mean h/e 
and lim^^o eD^{t) = io{t) (see, e.g., page 48 of Ito and McKean [10]), so 
EoioiTh) = h and 



Eo{e 



X + l/h 1 + Xh 



Using the explicit formula in (2) or the fact that u{t,x) satisfies the heat 
equation with a bounded boundary condition on [0, oo) x {0} shows u{t, x) is 
Lipschitz continuous on [0,T] x [—K,K]. Since has the same distribution 
as /i^Ti, \u{t — Th,h) — u{t,h)\ = 0{h?). Using this with Po(''"h > = o{h), 
we have 

du , , , uit, h) — u(t, 0) 
— t,0+ = lim ^ ' \ ' 
dx h^o h 

1 - En(e~^^°^'''''>) 
= u{t, 0) lim ^ = Xu{t, 0). 

The remainder of the paper is devoted to proofs. Theorem 2 is fairly 
straightforward to prove. Let Z^^ be the difference between the colony num- 
bers for the two lineages, and let be the embedded jump chain, which 
jumps when a lineage changes colonies. {L'^-)/L converges to a Brownian 
motion starting from 1. Using the fact that \B^ \ — £o(^) is a martingale, it is 
easy to show that if is the number of visits to by Y^ then {Lp'-) / L 
converges to the local time Iq. (Borodin [3] proved this for aperiodic mean 
0, finite-variance random walks.) Each visit to by YJ^ brings a probability 
of coalescence of roughly u/^u + 1/N) for our two lineages, and the result 
follows from routine calculations. See Section 2 for details. 

It is easy to give an intuitive proof of Theorem 3 along similar lines. 
The difference in the location between two lineages in the genealogy of voter 
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model is a continuous-time random walk that jumps at rate 2, so it is enough 
to consider the embedded discrete-time jump chain. Let be a random 
walk with jump distribution q'^ . Let 1/2 < a < 1. The number of visits to 
/ = [—N"',N"'] by time t, divided by 2A'"", converges to the local time at 
of a Brownian motion. If we look at the chain only when it is in /, then 
we get a Markov chain that mixes more rapidly than its expected time to 
hit 0, so a result of Aldous and Fill [1] implies that the hitting time of for 
the chain viewed on / has approximately an exponential distribution. 

To complete the proof outlined in the previous paragraph, one must prove 
that the excursions off of / are sufficiently independent of the behavior in / 
so that the exponential waiting time and the local time are asymptotically 
independent. We have not been able to formalize this intuition, so we will 
instead pursue an approach based on the downcrossing definition of local 
time. Let Tq = inf{/c : |Xf | < N^/^} and for m > let 

5„ = inf{A;>T„,:|Xf| >2iV5/6}, 

T^+i = inf{A: > : |Xf | < \X^J - N"^/^}. 

Visits to can only occur during [Tm , Sm] , while most of the time is in the 
intervals [Sm,Tm+i] ■ The definition of T^+i is chosen so that the distribution 
of Tm+i — Sm is independent of X^ (Sm ) , and this allows us to get the desired 
asymptotic independence. 

Section 3 gives the proof of Theorem 3 modulo three propositions that 
are established later. Let M^(n) = sup{m : Sm < n} be the number of cy- 
cles completed by time n. Proposition 1, proved in Section 4, gives the 
convergence of {Nt)/N^/^ to local time. Proposition 2, proved in Sec- 
tion 5, shows that the time spent in the intervals \Tm,Sm\ is a small frac- 
tion of the total time. Proposition 3 gives asymptotics for the probability 
of hitting before time Sm for the possible values of X^{Tm), which are 
±N^/^ _)_ o ( log ) . Proposition 3 is the most difficult part of the proof. 
It relies on estimates for the potential kernel, which are based on results for 
the Green's function, which in turn come from a local central limit theorem. 
The technical problem is that all of our estimates must be uniform in N . 
These details occupy Sections 6 and 7. 

2. Proof of Theorem 2. Let be the difference in the colony numbers 
at time t. Let be the discrete-time embedded chain that jumps when- 
ever one of the two lineages changes colonies, and continues jumping even 
after the two lineages have coalesced. is a simple random walk. Recalling 
yj^ = L, we let {t) = Y^2i\^/L. Since converges in distribution to a 
standard Brownian motion W{-), C = C([0, oo),M) with the topology of uni- 
form convergence on compact time intervals is a complete separable metric 
space, Skorokhod's theorem implies that we can assume these processes have 
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been constructed on the same space so that W^{-) — > VF(-) almost surely. 
See, for example, Theorem 3.3 on page 7 of Billingsley [2]. 

Let be the number of visits to by , k <m. The next result has 
been proved for finite- variance random walks by Borodin [3]. To keep this 
paper self-contained, we will give a simple proof for the nearest-neighbor 
case. 

Lemma 1. {Lp' ■) / L ^ , the local time at for W, almost surely 
in C. 

Proof. Let A'^lt) = {L?-)/L. An easy computation for simple ran- 
dom walk shows that for any stopping time 5 

EM''{S + t)-A''{S)\<Eo\A''{t)\<^(^ + Y.^C/vi^ <CVi, 

so by Aldous' criterion (see, e.g.. Theorem 4.5 on page 320 of Jacod and 
Shiryaev [10]) the sequence is tight. Let A^" be a convergent subse- 
quence with limit A. \W^''{t) \ — A^'^(t) is a martingale. Using the maxi- 
mal inequality on the random walk, and the dominated convergence theorem 
on the increasing process, both processes converge to their limits in L^. Since 
conditional expectation is a contraction in , it follows that |Vl^(t)| — A{t) 
is a martingale, ioit) is the increasing process associated with See, 
for example, (11.2) on page 84 of Durrett [7]. By the uniqueness of the 
Doob-Meyer decomposition A{t) =io{t). This shows that there is only one 
subsequential limit, so the entire sequence converges to io{t). □ 

To move this result from to , we note that time m in Y^ corre- 
sponds to a time ^ m/2v in Z^, and hence time L'^t/2v in corresponds 
to a time ~ L^t in Y^ , where as usual ^ bj^ means aj^ /hjq — > 1. Now 
Z^ will have a geometric number of chances with mean \/v for coalescence 
between jumps of Y^ so the probability of no coalescence is 



oo 

j^^ ' ^ ' ' l/N + u{l-l/N) l + Nv 

Recall our assumptions imply Nv oo and hence N ^ oo. 

When m = L'^t, the number of visits to by YJ^^ will be ~ L£Q{t) and 
hence the probability of no coalescence is 

1 \ «o{t) 
1- — ) ^e-(i/")^oW. 



If ^ is a mean 1 exponential, the right-hand side can be written as 

P{{l/a)eo{t)<0 = Pi^oHaO>t), 
which completes the proof of Theorem 2. 
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3. Proof of Theorem 3. Here we give the proof, assuming the truth of 
three propositions that will be proved in the next three sections. Let , 
A; > 0, be a discrete-time random walk with jump distribution . To avoid 
some annoying little details, it is convenient to suppose that = xn > 
2N^I^ . To extend to the general case, it is enough to show that starting 
from X 7^ the probability of hitting before time Sq defined below tends 
to 0, but this follows from Lemma 5. 

Define two interleaved sequences of stopping times as follows. Let Tq = — 1 
and for m > let 

= ini{k > Tm. : |Xf I > 2N^I^}, 

r„,+i = \ni{k > S.^ : \X^\ < \X^^ I - N"^/^}. 

Sm is the exit time from the larger strip [—2N^^^,2N^^^]. Since 

2N^/^ - BN^/'^ log N < \X^{Sm)\ <2N^/^ + BN^/^ log N, 

Tm is almost the hitting time of the smaller strip [—N^^^,N^/^]. The ad- 
vantage of this definition is that the processes {\X^ [Sm + ^)| — l-'^'^('S'm)l) 
< A; < Tm — Sm} are identically distributed for m > and independent of 
J-{Sm)- Here and in what follows, we will write X^ {Sm) instead of Xg to 
avoid double subscripts. 

Let M^{n) = sup{m: Sm < n} be the number of cycles completed by 
time t and let L^{n) = \{1 <m< (n) : X^ {Sm-i)X^ (Sm) < 0}| be the 
number of crossings of [-2iV^/'5, 2iVV6] random walk. Our first resuh 

to be proved later is: 

Proposition 1. Suppose x^/aN ^ xq. Then 

2L^ (Nt) /N'^l^ aio (t) and (Nt) /2N^/^ aio (t) , 
where io{-) is the local time at of a standard Brownian motion started at 

Xq. 

Let J = inf {m : 3A; G [Tm,Sm],X^ = 0}- The fact that one-dimensional 
finite-range random walks are recurrent implies J < oo. By the definitions 
of Sm and Tm, io G [Tj, S j]. Splitting things up according to the value of J, 

oo oo 

P{J = j, Tj > Nt] < P{to >Nt}<J2 P{J = S, > Nt}. 

j=0 j=0 

We will show that both series converge to the same limit as — > oo, 
thereby proving that P{tQ > Nt} converges to this limit as well. We first 
truncate the sums by neglecting the terms having j > N"^/^: 

Ar2/9 ^2/9 

(3) = Pj > ^ ^{*0 >Nt}<J2 P{J = J, Sj > Nt} + , 

j=0 j=0 
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where, as Lemma 2 will show, 



ef< E P{J = 3} = 0{ 



Defining Aj = Tq + Em=i(^m - Sm-i) and = Em=o('S'm - Tin), we can 
write Sj = Aj + Bj. For the reader's intuition, we note that Tm. — Sm.^i is the 
hitting time of a half-line, while 5"^ — Tm is the exit time from a bounded 
strip. The first variable has infinite mean and the latter finite variance, so 
we expect Aj » Bj for large j. 

It is clear that 

(4) J2 = >Nt}<Yl = J' > ^*}- 

j=0 j=0 

Our next task is to argue that 

(5) = 3, Sj >Nt}<Y.P{J = j, A, >Nt- 2iVi7/i8| ^ ^ 

j=0 j=0 

where 62 is another small error, this time 0{N~^^^). To prove the last 
inequality we note that, for any j < A^^/^, 

{J = j, Sj >Nt}c{J = j, Aj >Nt- 2Afi^/i^} U { J = i, 5^2/9 > 2N^^/^^}. 

In the last equality we should have written the integer part [A^^/^], but in 
what follows we will ignore these insignificant details. Taking now the union 
over j, (5) will follow from the following proposition. 

Proposition 2. For large N, P{B^2/9 > 2N^'^I'^^] < CN-^/^. 

Combining inequalities (3), (4) and (5), we can restrict ourselves to es- 
timating probabilities of the form P{J = j,Aj > Ns}. The two events here 
are almost independent. Aj is determined by the behavior of increments 
of the random walk in the intervals [Sm,Tm^i], while J is determined by 
the behavior in [Tm,Sm]- There is some dependence that comes through the 
value of the starting points X^{Tm)-, but because of assumption 5, these 
are all within distance BN^^'^XogN of N^l^ or —N^/^. As the reader can 
probably guess, the variability in the starting point makes little difference: 

Proposition 3. Suppose \x - N^/^\ < BN^/'^logN and let Hf{x,0) 
denote the probability that the random walk started at x hits before 
leaving the set I =[-2N^I^ ,2N^I^]. There is a constant C so that 



AT^/^Ff (x,0)-^ <CN^^'^\ogN. 
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The bound in Proposition 3 is uniform over the possible values of X'^ [Tm], 
so for simplicity we win write ctv = l/cr^ + 0(7V-^/^ logiV). 

Lemma 2. For every u > and j we have 

(6) nJ = J, A, > = ^ (l - ^JPiA, > n}, 

and hence E'^^n^^ P{J = j} < - cn/N^'^)^^'^ < exp(-ciVi/i8) . 

Proof. Let = l{to G [Tm,Sm\] and let A^. = - Ak_i for A; > 0, 
where A-i = 0. Using the strong Markov property, Proposition 3, the fact 
that Aj is independent of T[Sj^i) and induction, it is easy to see that 

P(Ao = uo,-^o = 0, Ai =vi,...,Ij^i = 0,Aj =Vj,Ij = 1) 

Since the A^ are independent, the desired result follows by summing over 
vq, . . . ,Vf^ that sum to more than u. □ 

The lower bound in (4) and the upper bound in (5) are similar, so it is 
enough to investigate the lower bound. Using Lemma 2 on the left-hand side 
of (4) gives 




Using Proposition 2, we get 

nto > Nt} > g ^ (i - ^) > m + 2iv-/-} + 

where £3^ is an error of order N^'^^'^. Recalling the definition of , the 
above is 

Ar2/9 

Proposition 1 implies that 

P{M^{Nt + 2Afi^/iS) < ^ P{alQ{t) <s/2). 

Let Co = l/c^ = liniAr^ooCAr. The dominated convergence theorem now im- 
plies that (7) converges to 

/■CO 

/ coe-'=o'P{cj£o(t) <s/2}ds. 
Jo 
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Introducing a mean 1 exponential random variable, ^, independent of Lo(t), 
and recalling c= l/o"^ is 1 over the mean of the exponential, this can be 
written as 

Piaioit) < ct2^/2} = P{£o\aa2) > t}, 

which is the conclusion of Theorem 3. It remains to prove the three propo- 
sitions. 

4. Proof of Proposition 1. Consider the sequence of random walks = 
XN + Ei=i4^ where for each N, xn> 2N^I^ and the variables 4^, i > 1, 
are i.i.d. with distribution . Define the sequence of stopping times Kq = 
and for j > 

K2j+i = -m.i{k > K2j : Xf < -2iV5/6}, 

(8) 

K2J+2 = mi{k > K2J+1 : > 2N^/'^}. 
In words, the -?^2j+i correspond to times at which the random walk finishes 
a down crossing of the interval [—2N^^^, 2N^^^] and the K2j-\-2 correspond to 
times at which the random walk finishes an up crossing of the same interval. 

To connect with the definitions given just before Proposition 1 in the 
previous section, note that {Sm : m > 0} D {K^ : /c > 0} (it is for this reason 
that we want xn > 2iV^/^), so we have 

(n) = sup{ j : Kj <n}. 

Here and in what follows, even though ^ a we will drop the subscript 
N for simplicity. 

Lemma 3. Suppose that xn /crN ^ xq, the are i.i. d. with E4^ = 0, 
E(4^)2 = Na'^ and E(4^)4 < CN^. Then 

2iV-i/6L^([iVt])^a£o(t) 

as iV— >oo, where ioit) denotes the local time at for a standard Brownian 
motion starting from xq. 

Proof. We first rescale the random walks by letting 

vN 1 cN 

Let Y^{t) = Sf^^y Our first task is to argue that it is possible to define the 

Y^^s and a Brownian motion B on the same probability space so that 
for each fixed t, the events 

(9) J^jv = ( sup \B{s) - Y^{s)\ < Af-5/24| 



12 



R. DURRETT AND M. RESTREPO 



satisfy P{0.n) 1. 

To prove this, we begin by recalling a well-known construction of Skoro- 
hod, see, for example. Section 7.6 in Durrett [7]. Given a Brownian motion 
B and a value of A^, this procedure constructs a sequence of stopping times 



, k>l, that satisfy 



B{Ti 



qN 



Y^{k/N) 



negative random variables having mean Et, 
variance 



and are such that the increments r/^ = T-^ — T^^^ are independent, non- 

N 



E{i^ /(tNY = l/N, and 



w^v{Tl^)<CE{if/aNf<C/N\ 
For s G [k/N, {k + 1)/N), we have 
\Y^{s)-B{s)\ = \Y''{k/N)-B{s)\ < \B{T^) - B{k/N)\ + \B{k/N) - B{s)\. 
We now fix t, and argue that there are sets 0]y with PiQ}^) — > 1 on which 



(10) 



\T^ - k/N\ < Ar-ii/24 for all ^ < 



Kolmogorov's maximal inequality (see, e.g., (4.3) in Chapter 4 of Durrett 
[8]) applied to the martingale — k/N gives 



p 



sup 

\k<Nt 



rpN ^ 



> iV-"/24^ < iVii/i2iVtvar(T:/^) < CtN-'/'\ 



By Levy's result on the modulus of continuity for Brownian motion we 
can find sets O^, with P(O^) — > 1 and such that, on $7^, x,y <t and 
\x — y\< A^~i^/24 imply (see, e.g., (4.10) in Chapter 7 of Durrett [8]), 

\B{x) - Biy)\ < 10i\x - y\ log(|x - y\-^)f^ < {l/2)\x - yf/^\ 

the last inequality holding for large since 5/11 < 1/2. On nljr\n% we 
have for s G [k/N, {k + l)/iV) 



Y\s)-B{s)\< 



B{Tf 



B 



+ 



B 



B{s) 



1 



< l(jV-(ll/24){5/ll) _^ ^-5/11^ < ^-5/24^ 

which proves (10). 

Having established (10), the rest of the proof of Lemma 3 is straight- 
forward. Let aN = (1/<t)2A^-V6 and bN = {l/a)N-^/'^^ . Using definitions 
similar to the Kj in (8), we can define J~-]^{t) and >C^(t) to be the num- 
ber of times the Brownian motion Bg, < s < t, has crossed the strips 
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[—aj\f + b]\f,a]\f — Bn] and [— oat — bN,a]\f + bN], respectively. On the events 
Otv we have 

£+(t)<L^(7Vt)<£^(t). 

On the other hand, a classical result obtained by Levy on the convergence of 
downcrossings to local time (see Ito and McKean [9], page 48) implies that, 
as ^ oo, 

(CTV + bN)C+{t) ^ io{t), (OTV - bN)C]^{t) ^ £o{t). 

To check the constant, recall that one multiplies the number of downcross- 
ings by the width of the strip, but here we count up- and downcrossings. 
This completes the proof of Lemma 3. □ 

To prove the convergence result for given in Proposition 1, we let 
r(n) = |{1 < m < < 0}| 

and note that = r(M^). Let 7„^ = 1 if {Srn-i)X^ {Sm) < 0. We have 

-BVNlogN< \X'^{Tm)\ - N^^^ < BVxiogN, 
so using the fact that X^ is a martingale, 
(11) P(7„ = l\T{T,n)) = 1/4 + 0{N'^/^logN). 

Let f (n) = r(n) — J2m=i P{lm = ^^{Tm))- r(n) is a martingale so the 
maximal inequality and the orthogonality of martingale increments imply 

e[ sup f (m) ) <CY. E{^m - Film = MHTm.))f < Cn. 

\m<n J 

Chebyshev's inequality implies 

P( sup f (m) > r?l'^\ < ri~^/l 

\m<n / 

The last result when combined with (11) implies that with high probability 

r(n) =n/4 + 0(riiV-i/2 logiV) ©(n^/^). 

We want to conclude from this that 

L;r = r(M„^)~M„^/4. 

To deal with the random index, we take n = N^/^ and let R = mf{r : M^(r) > 
N^/^} to get 

p( sup |L^(s)-M^(s)/4| >2Af2/i5\ <^-i/i5^ 

\s<NtAR / 

Since P{L'^{Nt) > ^ by Lemma 3, we must have P{R <Nt)^0 

and the proof of Proposition 1 is complete. 
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5. Proof of Proposition 2. In this section we will show that for large N, 

where Bj = Ylm=o — Tm ■ To do this we will compute the mean and vari- 
ance of Bj and then use Chebyshev's inequality. For this we first need to com- 
pute the first two moments of r/^ = Sm — ?m- If we assume (Tm) = N^^^ , 
\X^{Sm)\ = 2N^/^, and replace our random walk by a Brownian motion Bt 
with variance a'^Nt, this would be easy. Bj — a'^Nt and Bf — Qa'^NB'jt + 
Sa'^NH'^ are martingales so if Bq = N^/^ and rj = mi{Bt ^ [-2N-'/^ ,2N-'/^]}, 
then using we have 

_ ^2^^^ ^ ^10/6^ 

To prove this one must use the optional stopping theorem at rjAm and then 
let m — > cx). The details of using the monotone and dominated convergence 
theorem to justify the equalities are left to the reader. Solving gives 

Elf = 19N^/'/a\ 

These facts are approximately true for the random walk. We begin with 
the martingales. To compare with the previous calculation, recall that for 
the normal distribution E^^ = 3{E^'^)'^. 



Lemma 4. Suppose = Xq + £,i + h where E^i = 0, E^'f = a, 

E^f = and E^f = (3. Then X^ - ka and 

Xl - QaXlk + Sa'^k'^ + (Sa^ - P)k 



are martingales. 

Proof. The martingale X| — ka is well known. See, for example, Exer- 
cise 2.6 on page 235 of Durrett [8]. To check the second, expand (Xk + Ck+i)"^ 
and use E'^^ = and E^f, = to conclude 

EiXt^,\J^k)=Xt + 6Xia + P 

and hence 

E{Xl^^ - 6Xl{k + l)a - fi{k + l)\Tk) =Xt- 6Xlka - f3k. 

To get the martingale we want, the X^ on the left should be To correct 

this we note 

£;(-6(X|+i - Xl){k + l)a|J^fc) + 3a^(A: + 1)^ + ?,a^{k + 1) 
= -Qa^{k + 1) + 2,a^{k + if + 'ia^{k + 1) 
= 3a2[(/c + if -{k + 1)] = 2,Q^k^ + Sa'^k. 
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Adding the last two equations gives the desired result. □ 

In our case a = a'^N and (3 < CN"^. Letting Qm-i = ^{Tm) and using 
the optional stopping theorem on our first martingale with \X^ {T„^)\ > 
iV5/6 _ BN^/'^logN and 1X^(5^)1 < 2N^/^ + BN^^^logN, we have 

a^NE{7]m\Gm-i) < {2N^/^ + BN^/'^ log Nf - (iV^/^ - BN^^^logNf 
^3^10/6^0(^8/6 log AT), 

and it follows that if Ci > (3/cj^), then for large 

(12) E(r/^|g„„i)<CiiV2/3. 
From the second martingale we get 

E{X^{Smt-6aX^{Smf7]m + 3aWm + i^a''-P)Vm\Gm-l) = E{X^{Tm)^). 

Rearranging and using X'^ (Sm)^ > X^{Tm)^ gives 

Using |A^(r^)| <N^I^ + BN^/'^\ogN &nd\X^{S^)\ <2N^/^ + BN^/^ log N 
with (12), a = a'^N and 13 < CN'^, gives 

3a^N'^E{'nl^\g^.i) < [6(^2 A) (2 A^/^ + BN^^^ log A)^ + CN^] ■ CiN^'^. 

The first term in the square brackets is of order A • A^^/^ ^ A^. It follows 
that if C2 > SCi/a"^, then for large A 

(13) Eiril\gm^i)<C2N''/\ 

To estimate the size of Bj, recall ^m-i = ^{Tm) for m > and write 

3 3 

Bj = ^ E{'qm\gm.~l) + X! ^™ ~ E{T]m\gm.^l)- 
m=0 m=0 

By (12), if j < A^2/9, then the first sum 

^ < (j + l)Ci A2/3 < 2Ci A8/9. 
1 

To bound the second sum, we use the orthogonality of martingale increments 
and (13) to conclude 

^(e) = E ^(^™ - E{r^m\gm.-l)f < {j + 1)^2 A^^/'. 
\ 2 / m=0 

When j < A2/9, the right-hand side is < 2C2A^^^/^: 

p{T.> < 2C2A14/9A-17/9 = C2A-V3. 

Combining the bounds on J2i ^i^d J22 gives the conclusion of Proposition 
2. 
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6. Proof of Proposition 3. Recall that the recurrent potential kernel is 
defined by 

oo 

a{x, y) = Yl iPn{x, y) - Pn{y, y)), 

n=0 

where pn is the n-step transition probability of the random walk. To see the 
reason for this definition, note that 



^p{z,x)a{x,y) = J2iPn+i{z,y) - pn{y,y)) = a{z,y), zj^y, 



(14) 



n=0 

oo 



J2p{y,x)a{x,y) = ^{pn+i{y,y) - Pn{y,y)) = -1, 

X n=0 



SO a is the analogue of the Green's function for recurrent random walks. The 
key to the proof of Proposition 3 is the following result whose proof is given 
in the next section. Let 5{x, y) = 1 x = y and otherwise. 

Proposition 4. Assume a sequence of random walks satisfies assump- 
tions 1-5 of Section 1. There is a constant C independent of N such that, 
for all x, their recurrent potential kernels satisfy 

C 



a^{x,y) 



-l + 5{x,y) 



\x - y\ 



2N 



< 



N 



This estimate is only useful for |rE| ^ V-/V- Our interest in this result is 
that it gives the following estimate on the Green's function {x,y), which 
is defined to be the expected number of visits to y starting at x before 
leaving the set /. If we let tj be the exit time from I, then in symbols, 

oo 

Gfix,y) = Y,PAX^ = y,k<Tj}. 

k=0 

We will be interested in the case / = [-M, M] with M = 2N^/^. 

Proposition 5. There is aG independent of N such that for all x and 

M 



y 



G^{x,y)-[5{x,y) + 



2N 



a 



\x-y\ 
M 



+ 1 



xy 
M2 



< CAT- log TV. 



Remark. To see that the formula in square brackets is reasonable, note 
that it vanishes when x = M or x = —M and for fixed y is linear for x G 
[-M,y] and x G [y,M]. 
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Proof of Proposition 5. The first step is to note that (14) imphes 
{Xn,y) +J2^=o^i^m,y) is a martingale, so 

(15) Gf{x,y)=E,[a^{x,y)-a''{Xr,,y)]. 



G?{x,y) = a''{x,y) - P^X^^ > M}E'-[a{X^^ ,y)\X^^ > M] 



From (15) we have, for each fixed N, and x,y £ [—M,M]: 
(16) 

- P^X'^^ < -M}E-[a{Xi],y)\Xi] < -M]. 

Using now that < - M < BN^^"^ log AT when > M, the correspond- 
ing inequality for exiting at — M, and the fact that the random walk is a 
martingale, we have 

x < P^X^ > M}(M + BVNlog N) + {1- P,{X^ > M})(-M), 
X > Px{X^j > M}M + (1 - Px{X^ > M})(-M - BVNlogN). 

Using these equations we have 

M + x ^r^t^N^ ^^^^ ^ M + X + B^/NlogN 



< P^{X'' > M} < 



2M + By/N\ogN~ ' 2M + By/N\ogN ' 

and it follows that 

P,{X!^^ >M} = + 0{^\ogN/M). 

Subtracting from 1, 

P.{X,^ < -M} = + 0{^\ogN/M). 

Using the last two formulas and Proposition 4 in (16), 
Gf (x,y) = -1 + 5{x,y) - + 0(l/ViV) 

+ + 0(^/iV log iV/M)) (l + + 0(logiV/v^)) 

+ + 0{VN\ogN/M)^ [l + + 0(logiV/^/iV)) . 

The worst error term is 0{\fN\ogN/M) = 0{N~^^^\ogN). Ignoring the 
error terms, the sum of the second and third lines is 

M yx 

1 + 



0-2X Ma'^N' 
Adding this to the first line completes the proof. □ 
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Proof of Proposition 3. When M = 2iV^/^, Proposition 5 gives 
(0,0) = 1 + ^ + OiN-'^HogN) = 1 + 0(Af-i/6)^ 
and x = N^/^ + 0{N^/'^ log N), 

Let Hj^{x,0) denote the probabihty that the random walk started at x 
hits before leaving /. Breaking things down according to the hitting time 
of 0: 

Gf(x,0) = i7f(x,0)Gf(0,0) 

which gives 

^i^(^'O) = + 0(iV-^/^logiV), 

which is the desired result. □ 

Lemma 5. // < |x| < 2N^/^ , 



Proof. Taking y = in Proposition 5 we see that 

G'^{x,Q)<^ + CN-^lHogN. 

The result now follows from Hf{x,0) = Gf (x,0)/Gf (0,0). □ 

7. Proof of Proposition 4. The proof relies on a local central limit the- 
orem, with bounds that take into account the dependence on A^. First, we 
need a few definitions. Let 



be the normal density with variance £a^. Let be the distribution of the 
random walk at time k when it starts at 0. 
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Proposition 6 (Local central limit theorem). Given a sequence of ran- 
dom walks with jump probabilities p^ satisfying assumptions 1-5, there is a 
constant C, independent of N , such that for all k>l and all x we have 

C 

\pUx)-PkN{x)\<^^^. 

The proof of this uses standard techniques but is rather lengthy so we 
begin by giving the 

Proof of Proposition 4. By translation invariance it is enough to 
compute {x) = {0,x). The local central limit theorem shows that, for 
all N and A; > 1, 

p^{0)-p^{x) = pkN{0)-PkN{x) + O^ ^ 

Therefore, after summing over > 0, 

oo 

a^ix) = ^pk{x) -Pk{0) 
k=0 

oo 

= -1 + 6{x, 0) - ^[pfcjv(O) - PkNix)] + 0(1/^). 

k=l 

We will now show that T,kLi[PkN{0) - PkN{x)] = \x\/a^N + 0{1/VN). Let 
z = x/V a'^N . Recalling the definition of PkN^ 

oo 1 oo 

k=i V27ra^iV fc^i Vfc 

Now the function fz{t) = (1 — e~^^/^*)/\/2t, being a decreasing function di- 
vided by an increasing function, is decreasing in t and therefore 

rk+l 

0<fz{k)- fz{t)dt<f,{k)-f,{k + l) 
Jk 

and thus Yl'k=i fz{k) — fzif) dt < /z(l) < 1. For the missing first piece of 
the integral we note 

C fz{t)dt= f' l.a-e-^'/^')dt< f'l=dt = 2. 
Jo Jo \Jt Jo yt 

Hence, 

oo 

^[pkN{^) - PkN{x)\ 
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We have put the \/27r inside so that the integral is —1 times the recurrent 
potential kernel for one-dimensional Brownian motion and hence is equal to 
\z\. One can find this fact on page 103 in Durrett [7], or derive it by changing 
variables t = /u and doing some calculus. In either case the result is 



Putting everything together we get 

a^{x) = -l + 5{x,{)) 
which is the desired result. □ 



O 



a 



2 AT 



1 



ixl 1 



Before entering into the proof of Proposition 6, we begin with an estimate 
on (j)^ , the characteristic function of the displacement S,i ■ This is the only 
proof that will require the use of assumption 3. 



Lemma 6. Let L = N^/"^ . There are constants a, 6 > so that, for all N 
and \9\ < tt, we have 

^ ^'-l(l-a), |0|e(4/(2L + l),7r]. 

Consequently, given any e G (0,7r], there is a c> 0, independent of N, such 
that whenever \6\ > e/VN, {6)\ < e'" . 



Proof. We use assumptions 3 to write [x) = [x) + r^ {x) where 
{x) = h/L for |x| < L, otherwise, and r^(x) = p^ {x) — (x) > 0. Define 
= Ex b^i^) = (2^ + l)h/L^2h as L^oo, and note that J2x (.x) = 
l-B^.To bound 6^, 



{e)\< J2 ^(^) + 



^ b{x}e 

x=—L 



ixd 



(17) 



h 



lie 



(1 - B^) + 



/i I sin((2L + 1)^/2)1 



L 



|sin(0/2)| 



where in the last step we have multiplied numerator and denominator by 
e~*^/^. Since the last expression is symmetric in 0, we now restrict ourselves 
to 6le [0,7r]. 

For the next step we need the following inequalities. 
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Lemma 7. (i) sm{e/2) > 6 /A for all 6* < vr. 
(ii) If x>2, (sin2)/2 > (sinx)/x. 

Proof. First we observe that cosx is decreasing on [0,7r/2) so if a; < 
7r/2, then 

smx I r , 2 r/"^ 



/ cosydy>- cosydy>-, 
Jo vr Jo ^ 



X X . 

which proves (i). For the second we note that 



sinx\' xcosj; — sinx 



X / x"^ 



On [2,7r) the latter is negative since sinx > while cosx < there. Thus 
sin2/2 > sinx/x for all x on this interval. For x G [vr, 27r) the same inequality 
is obvious since sin a; < 0. Finally, for x > 27r, sinx/x < 1/x < l/(27r) < 1/4 < 
sin2/2, where we have used the fact that sin2 ~ 0.909 > 1/2. □ 

Taking x = {2L + 1)^/2, the inequalities in Lemma 7 imply that for \0\ > 
4/(2L + 1) 

hsm{{2L + 1)9/2) /i (sin2)(2L + 1)0/4 „jv/ • on 

L — Mm — m ^ ^'"^ ^ 

so we have 

|(/.^(0)| <l-(l-sin2)-S^, 

which gives the conclusion of Lemma 6 on this range of 9. 

For 9 < 4/(2L + 1), we rewrite the second term on the right-hand side of 
(17) as Ee^^^ where U is uniformly distributed on {— L,— L-|-1,...,L} and 
use (3.7) on page 101 of Durrett [8] to conclude 



Ee^'^ - [ 1 



9'^EU 



To bound the moments we use 



Eu^ = ^—yk''>^— [\^dx. 

1 - 2L + 1 io 



2L + 1 - 2L + lJo 3{2L + 1) ' 



For 9 < 4/(2L + 1) 



L-1/2 2L + 1 5(2L + 1) 



6*2 „ „iV 2L 
—EU^ > 9^— 



2 - 6 2L + 1 
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SO we have 



4! -4! (2L + 1)2 5 
< 02^^(1 + 1/2L)2, 



< 1 

< 1 



Even when L = 1, (1/6) • 2/3 = 1/9 is larger than (1/30) • (3/2)^ = 3/40 and 
we have proved Lemma 6. □ 



Proof of Proposition 6. By assumptions 2 and 4, we have that 



(18) <t>''{e) = i 

The inversion formula gives 



f + N'^om'^) 



Introducing new variables s = \/Nk ■ 9 and z = x/\/ Nk, the above is 



1 



'Nkn 
/Nkw 



/Nk 



'• ds. 



Now, by (18), we can find an e > 0, independent of N ^ such that if s < ey/Nk, 
the following approximation is valid: 



'Nk 



exp< /clog I 



'NkJ 



exp|A; 



2k 



+ 



exp 



exp(g(s,A:)), 



where |(7(s,fc)| < c\s\^ /k, and c is independent of A''. By choosing e smaller 
if necessary, we can guarantee that \g{s^ k)\ < a'^s^ /A. 

As observed in Lemma 6, there is a c independent of such that |(^^(^)| < 
for \6\ > e/\/iV, that is, for |s| > e\^. Using these observations we can 
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rewrite 



1 



2'K^jNk J\s\<eVk 
1 

+ 



2-irVNk JeVk<\s\<^^TT 



The second term is < e '^^ since the integrand is < e '^^ by Lemma 6 and 
the interval has length < 27r\/ Nk. 

Using |e5(^'^) - 1| < cs'^/k if |s| < fc^/^ and \e3^'^^'^ - 1| < exp(a2sV4) if 
< |s| < e\/k, we have 



1 



27rVNk J\s\<eVk 



< ( e-'^"''"l\s^/k ds+ [ e-'^'/^ ds) . 

Replacing k^^^ by oo in the limits in the first integral, and using > |s|/c-^/^ 
in the second, the above is 



< 



1 



7 + 0(e-^'=''')). 



27rVNk\k 

On the other hand, setting z = x/V Nk and later s = O^Nk we have 



1 



2TTVNk J\s\<eVk 
1 



27r\/ Nk J\s\<e^/k 

1 

1 



27rV Nk J\s\>eVk 

= PNk{x) + 0{e-''''), 
which proves the result. □ 

APPENDIX 

Maruyama [11] considered a discrete-time Wright -Fisher model with a 
ring of 2n colonies with N diploid individuals, nearest-neighbor migration 
with probability m and mutation rate u per generation. Here, to facilitate 
comparison with Maruyama [11] we use his notation. In Section 6 he consid- 
ered sampling two individuals, one from the colony at and the other at i, 
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and let fi be the probability the two were identical by descent. This occurs 
if there is no mutation before the coalescence time Iq so 



fi = E,{l-2uY°. 

By writing recursive equations for the fi and then finding all of the eigen- 
values and eigenvectors of an associated matrix, he developed exact but 
somewhat cumbersome formulas for the /j. Writing Iq instead of his T, 
(6.7) says 

(l-u)2 







(1 - u)2 + 2iV//o 
where u is the mutation probability per generation, 
,^-1 r [l-m(l-cosg)]^ 



l-(l-n)2[l-m(l-cos6l)]2 

and m is the migration probability per generation. We are interested in 
the limiting behavior as u ^ 0, so Iq ^ oo. Since the contribution to the 
integral over [e,7r] stays bounded, it is enough to investigate the behavior 
near 0. Since 1 — cos 6 ~ 6'^/2 as — > 0, the denominator should be well 
approximated by 

1 - (1 - 2u){l - mO'^) =2u + mO^. 

Note that here u and are small but m need not be. Changing variables 
Q = {2u/mY/'^x^ we have 

2u + 2ux'^ \ m J 

1 /-^^ 1 , 
ax 



(2Mm)V2 7o i + x 

7r/2 
~ (2Mm)V2 

where to evaluate the integral we have used the definition of the Cauchy 
distribution (see, e.g., page 43 of Durrett [8]). Combining our calculations 
gives 

1 



/o 



l + 4iV(2u?n)i/2- 

Changing notation m = i/, and setting u = \/2{Lp' /v), we have 

£:o(exp(-Ato/(LV^))) = /o ~ (1 + A^Nv/L)-\ 
from which (1) follows. 
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